Brief Overview 1

Column

In this session, we will use Black Friday Data in Kaggle to study how to make the following graphical displays.

Column

Graphical Displays

  • Categorical Data
    • Bar Chart
    • Pie Chart
  • Quantitative Data
    • Histogram
    • Box Plot
    • Scatter Plot
    • Line

Common Arguments

  • col: a vector of colors
  • main: title for the plot
  • xlim or ylim: limits for the x or y axis
  • xlab or ylab: a label for the x axis
  • font: font used for text, 1=plain; 2=bold; 3=italic; 4=bold italic

Brief Overview 2

Row

In this session, we will use Black Friday Data in Kaggle to study how to make the following graphical displays.

Row

Graphical Displays

  • Categorical Data
    • Bar Chart
    • Pie Chart
  • Quantitative Data
    • Histogram
    • Box Plot
    • Scatter Plot
    • Line

Common Arguments

Here is a list of common arguments:

  • col: a vector of colors
  • main: title for the plot
  • xlim or ylim: limits for the x or y axis
  • xlab or ylab: a label for the x axis
  • font: font used for text, 1=plain; 2=bold; 3=italic; 4=bold italic

Data

Column

First 500 Observations

Column

Description

In order to understand the customer purchases behavior against various products of different categories, the retail company “ABC Private Limited”, in UK, shared purchase summary of various customers for selected high volume products from last month. The data contain the following variables.

  • User_ID: User ID
  • Product_ID: Product ID
  • Gender: Sex of User
  • Age: Age in bins
  • Occupation: Occupation (masked)
  • City_Category: Category of the City (A,B,C)
  • Stay_In_Current_City_Years: Number of years stay in current city
  • Marital_Status: Marital Status
  • Product_Category: Product Category (Masked)
  • Product_Category_2: Product may belongs to other category also (Masked)
  • Product_Category_3: Product may belongs to other category also (Masked)
  • Purchase: Purchase Amount
Rows: 550,068
Columns: 12
$ User_ID                    <dbl> 1000001, 1000001, 1000001, 1000001, 1000002…
$ Product_ID                 <chr> "P00069042", "P00248942", "P00087842", "P00…
$ Gender                     <chr> "F", "F", "F", "F", "M", "M", "M", "M", "M"…
$ Age                        <chr> "0-17", "0-17", "0-17", "0-17", "55+", "26-…
$ Occupation                 <dbl> 10, 10, 10, 10, 16, 15, 7, 7, 7, 20, 20, 20…
$ City_Category              <chr> "A", "A", "A", "A", "C", "A", "B", "B", "B"…
$ Stay_In_Current_City_Years <chr> "2", "2", "2", "2", "4+", "3", "2", "2", "2…
$ Marital_Status             <dbl> 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0…
$ Product_Category_1         <dbl> 3, 1, 12, 12, 8, 1, 1, 1, 1, 8, 5, 8, 8, 1,…
$ Product_Category_2         <dbl> NA, 6, NA, 14, NA, 2, 8, 15, 16, NA, 11, NA…
$ Product_Category_3         <dbl> NA, 14, NA, NA, NA, NA, 17, NA, NA, NA, NA,…
$ Purchase                   <dbl> 8370, 15200, 1422, 1057, 7969, 15227, 19215…

Bar Chart

Row

Bar chart is a graphical display good for the general audience. Here, we study the distribution of age group of the company’s customers who purchased their products on black friday. Usage: barplot(height, …)

A bar chart can be horizontal or vertical. Using the argument col, we can assign a color for bars. The argument main could be used to change the title of the figure. We can use RGB color code to assign colors.

Analysis

Row

Vertical Bar Chart

### Horizontal Bar Chart

Pie Chart

Column

Usage:pie(height, …)

###Analysis

Column

Distribution of City Category

Histogram

Column

Histogram is used when we want to study the distribution of a quantitative variable. Here we study the distribution of customer purchase amount.

Usage hist(x, …)

Column

Boxplot 11

B1

Boxplot 2

B2

Column

Analysis of Boxplot 1

Analysis of Boxplot 2

Scatterplot

Column

Column

Analysis

Line Plot

Column

Data

Analysis

Column

Line Chart

---
title: "Basic Graphical Displays"
output: 
  flexdashboard::flex_dashboard:
    theme:  
      version: 4
      bootswatch: 
      navar-bg: "purple"
   
    orientation: columns
    vertical_layout: fill
    source_code: embed
    
    
---

```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(tidyverse)
library(plotly)
Friday<-read_csv("~/Downloads/Black_Friday.csv")
```
Brief Overview 1
===
Column {data-width=450}
---

In this session, we will use Black Friday Data in [Kaggle](https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical displays.

```{r}

```

Column {.tabset data-width=550}
-----------------------------------------------------------------------

### Graphical Displays
- Categorical Data
  - Bar Chart
  - Pie Chart
  

- Quantitative Data
  - Histogram
  - Box Plot
  - Scatter Plot
  - Line

### Common Arguments
- col: a vector of colors
- main: title for the plot
- xlim or ylim: limits for the x or y axis
- xlab or ylab: a label for the x axis
- font: font used for text, 1=plain; 2=bold; 3=italic; 4=bold italic

Brief Overview 2 {data-orientation=rows}
===

Row {data-height=100}
---
In this session, we will use Black Friday Data in [Kaggle](https://www.kaggle.com/datasets/pranavuikey/black-friday-sales-eda) to study how to make the following graphical displays.

Row {.tabset data-height=900}
---
### Graphical Displays
- Categorical Data
  - Bar Chart
  - Pie Chart
  

- Quantitative Data
  - Histogram
  - Box Plot
  - Scatter Plot
  - Line

### Common Arguments
Here is a list of common arguments:

- col: a vector of colors
- main: title for the plot
- xlim or ylim: limits for the x or y axis
- xlab or ylab: a label for the x axis
- font: font used for text, 1=plain; 2=bold; 3=italic; 4=bold italic

Data
=== 

Column {data-width=550}
---

### <b><font size = 4><span Style = "color:blue">First 500 Observations</span></font></b>

```{r show_table}
datatable(Friday[1:500,],rownames=FALSE, colnames = c("User ID", "Product ID", "Gender", "Age", "Occupation", "City Category", "Stay In Current City Years", "Marital Status", "Product Category 1", "Product Category 2", "Product Category 3", "Purchase"), options = list( pagelength = 20))


```

Column {data-width=450}
---

### <font size = 4><span Style = "color:red">Description</span></font>

In order to understand the customer purchases behavior against various products of different categories, the retail company "ABC Private Limited", in UK, shared purchase summary of various customers for selected high volume products from last month. The data contain the following variables.

- User_ID: User ID
- Product_ID: Product ID
- Gender: Sex of User
- Age: Age in bins
- Occupation: Occupation (masked)
- City_Category: Category of the City (A,B,C)
- Stay_In_Current_City_Years: Number of years stay in current city
- Marital_Status: Marital Status
- Product_Category: Product Category (Masked)
- Product_Category_2: Product may belongs to other category also (Masked)
- Product_Category_3: Product may belongs to other category also (Masked)
- Purchase: Purchase Amount

```{r} 
glimpse (Friday)
```

Bar Chart {data-orientation=rows}
===
Row {data-height=350}
---

###
Bar chart is a graphical display good for the general audience. Here, we study the distribution of age group of the company's customers who purchased their products on black friday.
**Usage:** barplot(height, ...)

A bar chart can be horizontal or vertical. Using the argument <span Style="color:orange">col</span>, we can assign a color for bars. The argument <span Style="color:orange">main</span> could be used to change the title of the figure. We can use RGB color code to assign colors.

### Analysis

Row {data-height=650}
---

### **Vertical Bar Chart**
```{r bar1}
par(mgp=c(4,1,0)) #change the margin line for the axis title, axis labels and axis line
par(mar=c(5,7,4,2)) #set margin of the figure
barplot(table(Friday$Age),col="lightblue", main = "Distribution of Purchases by customer age", ylab = "Number of Purchases", xlab = "Age Group")
```
### **Horizontal Bar Chart**
```{r bar2}
par(mgp=c(4,1,0)) #Change the margin line for the axis title, axis labels and axis line
par(mar=c(5,7,4,2)) #Set margin of the figure
Friday%>%
  ggplot(aes(x=Age))+
  geom_bar(fill="#69b3a2")+
  coord_flip()+
  labs(title = "Disribution of Purchases by Customer's Age",
       x= "Age Groups",
       y= "Number of Purchases")->bar1
ggplotly(bar1)
```

Pie Chart
===

Column {data-width=500}
---

**Usage:**pie(height, ...)

###Analysis

Column {data-width=500}
---

### Distribution of City Category
```{r pie}
H<- table(Friday$City_Category)
percent<-round(100*H/sum(H), 1) #calculate percentages
pie_labels<-paste(percent, "%", sep="") #include %
pie(H, main = "DIstribution of City Category", labels = pie_labels, col = c("#54d2d2", "#ffcb00", "#f8aa4b"))
legend("topright", c("A", "B", "C"), cex = 0.8, fill = c("#54d2d2", "#ffcb00", "#f8aa4b"))
```

Histogram
===

Column {data-width=500}
---

###
Histogram is used when we want to study the distribution of a quantitative variable. Here we study the distribution of customer purchase amount.

**Usage** hist(x, ...)

```{r histogram}
Friday %>% ggplot(aes (x=Purchase))+
  geom_histogram(fill="blue")+
  labs(title = "Distribution of Customer Purchase Amount",
       x="Purchase Amount (British Pounds)")
```

Column {data-width=500}
---

### Boxplot 11

#### B1

```{r boxplot 1}
boxplot(Friday$Purchase, xlab="Purchase Amount", ylab="British Pounds")
```




### Boxplot 2

#### B2

```{r boxplot 2}
boxplot(Purchase ~ Gender + Marital_Status, data = Friday, main="Distribution of Purchase by Sex and Marital Status", xlab="Sex and Marital Status", ylab="Purchase", cex.lab=0.75, cex.axis=0.5, names = c("Female & Single", "Male & Single", "Female & Married", "Male & Married"))
```

Column {data-width=450}
---

### Analysis of Boxplot 1


### Analysis of Boxplot 2

Scatterplot
===

Column {data-width=500}
---

###

```{r scatterplot}
plot (mpg ~ wt, data=mtcars,
      xlab = "Weight (1000 lbs)", ylab = "Miles per Gallon",
      pch = 19, col = "blue")
```

Column{data-width=500}
---

### Analysis

Line Plot
===

Column {.tabset data-width=350}
---

### Data

```{r data}
Date<- 13:22
Dayton_OH <- c(84, 86, 91, 89, 89, 91, 92, 91, 91, 91)
Houston_TX <- c(100, 97, 96, 94, 94, 94, 93, 93, 92, 91)
Denver_CO <- c(95, 85, 89, 96, 97, 96, 92, 91, 95, 96)
Fargo_ND <- c(86, 80, 84, 87, 90, 87, 83, 84, 87, 89)
df<-data.frame(Date, Dayton_OH, Houston_TX, Denver_CO, Fargo_ND)
datatable(df, rownames = FALSE, colnames = c("Date", "Dayton, OH", "Houston, TX", "Denver, CO", "Fargo, ND"))
```

### Analysis

Column {data-width=650}
---

### Line Chart

```{r line1}
plot(Date, Dayton_OH, type="o", col="blue", xlab="Date in July", ylab="Highest Temperature", ylim=c(80, 100))
lines(Date, Houston_TX, type="o", col="red")
lines(Date, Denver_CO, type="o", col="purple")
lines(Date, Fargo_ND, type="o", col="darkgreen")
#Add a legend
legend("topright", #position of the legend
       legend = c("Dayton, OH", "Houston, TX", "Denver, CO", "Fargo, ND"), #Labels
       col=c("blue", "red", "purple", "darkgreen"), #Colors
       lty = 1, #Line types
       pch = 1) #Point types
```